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■ We design and analyze minimax-optinial algorithms for online linear optimization 

games where the player's choice is unconstrained. The player strives to minimize regret, 
the difference between his loss and the loss of a post-hoc benchmark strategy. The 
standard benchmark is the loss of the best strategy chosen from a bounded comparator 
set. When the the comparison set and the adversary's gradients satisfy Loo bounds, we 



I give the value of the game in closed form and prove it approaches \J2T j-K as T — >■ oo. 

^ Interesting algorithms result when we consider soft constraints on the comparator, 

rather than restricting it to a bounded set. As a warmup, we analyze the game with a 
quadratic penalty. The value of this game is exactly T/2, and this value is achieved by 
perhaps the simplest online algorithm of all: unprojected gradient descent with a constant 
learning rate. We then derive a minimax-optimal algorithm for a much softer penalty 
function. This algorithm achieves good bounds under the standard notion of regret for 
CN| , any comparator point, without needing to specify the comparator set in advance. The 

' value of this game converges to -^/e as T — J- oo; we give a closed- form for the exact value 

. as a function of T. The resulting algorithm is natural in unconstrained investment or 

CO ' betting scenarios, since it guarantees at worst constant loss, while allowing for exponential 

reward against an "easy" adversary. 

K*^ ■ 1 Introduction 



Minimax analysis ha s recently been shown to be a powerful tool for the construction of online 



learning algorithms [Rakhlin et al.l . l2012l |. Generally, these results use bounds on the value 
of the game (often based on the sequential Rademacher complexity) in order to construct 
efficient algorithms. In this work, we show that when the learner is unconstrained, it is often 
possible to efficiently compute an exact minimax strategy. 

We consider a game where on each round t = 1, . . . ,T, first the learner selects xt S M", 
and then an adversary chooses gt £ G C M", and the learner suffers loss gt ■ Xf. The goal 
of the learner is to minimize regret, that is, loss in excess of that achieved by a benchmark 
strategy. We define 

T 

Regret = Loss — (Benchmark Loss) = gt - xt — L{gi, . . . , gx) (1) 

t=i 
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as the regret with respect to benchmark performance L (the L intended will be clear from 
context). Letting I{x G X) = for x £ X and oo otherwise, the standard definition of regret 
arises from the choice 



L{9i,---,9t) = ini gi-T ■ X + I{x £ X), (2) 

the loss of the best strategy in a bounded convex set X (we write gi-t = Yl^s=i 9s fo'^ ^ sum 
of scalars or vectors). When L depends only on the sum G = gi-T we write L[G). We will 
be able to interpret the alternative benchmarks L we consider as penalties ^ on comparator 
points, so L{G) = argmin^. G ■ x + 'I'(x), where ^!{x) has replaced I{x G X) in Eq. 

We view this interaction as a sequential zero-sum game played over T rounds, where the 
player strives to minimize Eq. ([1]), and the adversary attempts to maximize it. We study 
the value of this game, V'^ , and design minimax optimal algorithms for the player; formal 
definitions are given below. Some results are more naturally stated in terms of rewards rather 
than losses, and so we define Reward = —Loss = — Yln=i 9tXt- 



Outline and Summary of Results Section [2] provides motivation for the consideration 
of alternative benchmarks L. Section [3] then develops several theoretical tools for analyzing 
unconstrained games with concave benchmark functions L. Section H] applies this theory to 
three particular instances; Figure [1] summarizes the results from this section. These games 
exhibit a strong combinatorial structure, which leads to interesting algorithms and perhaps 
surprising game values. 

Section [4.11 serves as a warmup, where we show that constant step-size gradient descent is 
in fact minimax optimal for a natural choice of L, which can be though of as replacing the hard 
feasible set X in Eq. ([2]) with a quadra tic penalty function on comparator points. Section 



provides results analogous to those of lAbernethv et al.l 2008l |: we consider regret compared 



to the best x where 11 

^ 1 1 oo ^ 1 against an adversary constrained to play 1 1 (^t 1 1 oo ^ 1 > while 
Abernethy et al. considered \\gt\\2 ^ 1 and ||x||2 < 1 for n > 3 dimensions. Interestingly, 
while we prove results for the unconstrained player, we show the optimal strategy in fact 
always plays points from X = {x \ \\x\\oo < 1}, and so applies to the constrained case as well. 
Our results hold for the n = I case (where L2 and L^o coincide), showing that the value of 
the game approaches y^2r/7r as T — )• 00, as opposed to VT as one might extrapolate from 
the results of Abernethy. This indicates an interesting change in the geometry of the L2 
game between n = 1 and n = 3. Finally, Section 14.31 gi ves a minimax optimal algorithm for 
the settmg mtroduced bv lstreeter and McMa,^ ]^^ . Following their work, our algorithm 
obtains standard regret at most 0{RVTlog{{l + R)T)) simultaneously for any comparator 
X with \x\ = R, without needing to choose R in advance. However, we emphasize a slightly 
different interpretation of this setting, discussed in Section [2j It is worth noting that the regret 
(relative the the respective L) of these algorithms is 0{T), 0{VT), and 0(1), respectively, 
though all three are minimax algorithms. 



The Minimax Value of the Game Given a benchmark function L, the minimax value 
of the game is 

^ ( ™L '^^P ) \^9f xt- L{gi, . . . , 5t) (3) 
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Figure 1: Summary of online linear games considered in this paper. Results are stated 
for the one-dimensional problem where gt G [—1,1]; Corollary [2] gives an extension to n 
dimensions. The benchmark L is given as a function of G = gi-x- The standard notion of 
regret corresponds to the L[G) = arg min^.g[_^ ^'i:^ • x = — The benchmark functions 
can alternatively be derived from a suitable penalty ^ on comparator points x, so L{G) = 
arg min^ Gx + '^{x). 



where (inf^;^ supg^)^_^ is a shorthand notation for inf^^ sup^^ . . . infa;j, sup^^. Against a worst- 
case adversary, any algorithm must incur regret at least V'^ , and the minimax optimal al- 
gorithm will incur regret at most V'^ against any adversary. Since in this work we study 
minimax algorithms, we will often use the value of the game V'^ as an upper bound on 
Regret (as defined in Eq. ((II)). Generally we will not assume our adversaries are minimax 
optimal. 

We are also concerned with the conditional value of the game, Vj, given xi,...xt and 
gi, . . . gt have already been played. That is, the Regret when we fix the plays on the first 
t rounds, and then assui ne minimax optimal play for rounds t + I through T. However, 
following the approach of Rakhlin et al. 2012I ]. we omit the terms Xg • gs from Eq. ([3]). 

We can view this as cost that the learner has already payed, and neither that cost nor the 
specific previous plays of the learner impact the value of the remaining terms in Eq. ([1]). 
Thus, we define 



Vt{gi 



,9t) 



inf 



sup ) 



T 

E 

s=t+l 



gs-Xs- L{gi, ...,gT) 



(4) 



Note the conditional value of the game before anything has been played, Vo(), is exactly V'^ 



Related Work Reg ret-b ased analysis has received exten sive attention in recent years; see 
Shalev-Shwartz and ICesa-Bianchi and Lugosl |2006l | for an introduction. The anal- 

ysis of alternative notions of regret is also not new. In the expert setting, there has been 
much work on track ing a shifting sequence of experts rather than the single best expert; see 
Koolen et al. 2012 ] and references therein. Zinkevich 20031 ] considers drifting comparators 
in an online convex optimization framework. This notion can be expressed by an appro- 
priate L{gi, . . . , gx), but now the order of th e gra dients matte r s, un like the benchmarks L 



considered in this work. iMerhav et al.l [20061 ] and iDekel et al.l [20121 ] consider the stronger 



notion of policy regret in the o nline experts and bandit set tings, respectively. For investing 



scenarios. 



Agarwal et all |200d ] and iHazan and |2009l ] consider regret with respect to 



the best constant-rebalanced portoflio. 
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More recently, the field has seen minimax approaches to online learning. lAbernethv and Warmuth 



20101 ] give a minimax strategy for sever al zero-sum gam es against a budgeted adversary. Sec- 
tion |372] studies the online linear gar ne oflAbernethv et al . 200 _Sl under different a, ssumptions, 



and we adapt some techniques from Abernethv et al. 20091 ]. iRakhlin et al. 20121 ] takes pow- 



erful tools for non-constructive analysis of online learning problems and shows they can be 
used to design algorithms; our work differs in that we focus on cases where the exact minimax 
strategy can be computed. 



2 Alternative Notions of Regret 

One of our contributions is showing that that interesting results can be obtained by choosing 
L differently than in Eg. Q; in particular, we obta in minimax optimal algorithms for the 



problem considered by Streeter and McMahan 20121 ] by analyzing an appropriate choice of 
L. 

One could choose L{G) = 0, but this leads to an uninteresting game: the adversary has 
no long-term constrains, and so can simply pick gt to maximize gtXt for whatever xt the 
player selected. Thus, the player can do no better than always picking xt = 0. This is exactly 
the reason for studying the standard notion of regret: we do not require that we do well in 
absolute terms, but rather relative to the best strategy from a fixed set. 

That is, interesting games result when the player accepts the fact that it is impossible to 
do well in terms of the absolute loss Ylt 9f^t for all sequences gi, . . . , gx- However, the player 
can do better on some sequences at the expense of doing worse on others. The benchmark 
function L makes this notion precise: sequences for which L{gi, . . . , gx) is large and negative 
are those on which the player desires good performance^ at the expense of allowing more 
loss (in absolute terms) on sequences where L(gi, . . . ,gT) is large and positive. The value of 
the game V'^ tells us to what extent any online algorithm can hope to match the benchmark 
performance L. It follows by definition that if we add a constant k to L (making L easier to 
achieve), we decrease the minimax value of the game by k, without changing the minimax 
optimal strategy. 

We can use these ideas to derive algorithms for a setting that is quite different from typical 
online convex optimization. On each round t, the world (possibly adversarial, possibly not) 
offers the player a betting opportunity on a binary outcome; the player can take either side of 
the bet. The player begins with $1, but on later rounds can wager up to whatever amount he 
currently has (based on previous wins and losses). The player selects an amount xt to bet, and 
then the world reveals whether the bet was won or lost; if the player won the bet, he receives 
Xt dollars; otherwise, he loses xt dollars. The players net winnings are — ^tgtXt, where 
gt E { — 1, 1}; the player wins the bet when sign(xt) 7^ gt (thus, the player strives to minimize 
^tgtXt)- How should the player bet in this game? Clearly if the world is adversarial, we 
cannot do better than always betting xt = 0. But, we might have reason to believe the world 
is not fully adve rsarial; if we k new gt = 1 with a fixed probability p, then following a Kelly 
betting scheme Kellv Jr . IQSd ] might be appropriate, but knowing p is often unrealistic in 



practice. 

If the player is familiar with online linear optimization, he might try projected online 



^It can be useful to think about — L(G) as the benchmark reward for the sequence with gradient sum G. 
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gradient descent Zinkevich . 20031 ] with a constant step sizeH If we restrict our bets to the 
feasible set [—B,B], letting G = Qi-.t-, this algorithm guarantees Regret = Loss + B\G\ < 
2BVT. Then Winnings = -Loss > B\G\ - 2BVT. Thus in the best case (when \G\ = T) 
the player ends up with a little less than BT\ but he can lose up to B^/T when G = 0. Thus, 
to ensure he loses no more than the $1 he has on hand, he must choose B = 1/\/T. With 
this restriction, in the best case the player wins less than ^/T dollars. However, the post-hoc 
optimal strategy would have been to bet everything every round, netting winnings of 2^. 
Despite the theoretical guarantees, the player certainly might feel regret at having won only 
Vt in this situation! 

One might also hope t o use online algorithms for portfolio management, for example those 
of iHazan and Kalel [20091 ] and lAgarwal et alJ [200d ]. However, these algorithms require the 
assumption that you always retain at least an a > fraction of your bet, which is directly 
violated in our game. 

By carefully crafting a suitable benchmark function L, we can provide the player with 
a more satisfying algorithm. Ideally, we would like an L that satisfies three properties: 1) 
there exists an algorithm where regret is bounded by a constant e (for any T) with respect 
to L, 2) —L{G) > 0, and 3) —L(G) grows exponential in \G\. Properties 1) and 2) ensure 
the player never loses more than e running this algorithm; by scaling the bets the algorithm 
suggests by 1/e, he can ensure he never loses more than his starting $1. Property 3 implies 
that for "easy" sequences, we get exponential reward; in fact, given 1) and 2) we would like 
—L{G) to grow as quickly as possible. 

Of course, if the adversary chooses gt uniformly at random from { — 1,1} each round, 
we expect to frequently see \G\ > \/T, and so intuitively we will not be able to guarantee 
exponential winnings. This suggests the best we might hope for is a function like L{G) = 
— exp ^l^^. In fact, in Section [4.31 we show that constant regret against such a benchmark 
function is possible, and we derive a minimax algorithm. 



A Comparator Set Interpretation The classic definition of regret defines L indirectly 
as the loss of the best strategy from a fixed class X in hindsight, Eq. As this work shows, 
it can be advantageous to state L as an explicit function of G; however, useful intuition can 
be gained by interpreting L as a penalty function on comparator points x. That is, we wish 
to find a ^ such that 

L{G) = argmiuGx + ^'(x). 

X 

For the benchmark functions L we consider, we also derive the corresponding penalty func- 
tions \I' using convex conjugates. These are summarized in our results in Figure [TJ 

The standard notion of regret correspond to a hard penalty ^(x) = I{x G X). Such a 
definition makes sense when the player by definition must select a strategy from some bounded 
set, for example a probability from the n-dimensional simplex, or a distribution on paths in 
a graph. For such problems, standard regret is really comparing the player's performance to 
that of any fixed feasible strategy chosen with knowledge oi gi, . . . , gx', by putting an equal 
penalty on each of them, we do not indicate any prior belief that some strategies are more 
likely to be optimal than others. 

^Any other algorithm that provides a bound on standard regret of 0{By/T) will behave similarly. 
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However, in contexts such as machine learning where any x G corresponds to a valid 
model, such a hard constraint is difficult to justify; while any x £ R" is technically feasible, 
in order to prove regret bounds we compare to a much more restrictive set. As an alternative, 
in Sections 14.11 and 14.31 we propose soft penalty functions that encode the belief that points 
near the origin are more likely to be optimal (we can always re-center the problem to match 
our beliefs in this reguard), but do not rule out any x € a priori. 

3 General Unconstrained Linear Optimization 

In this section we prove a theorem that greatly simplifies the task of computing minimax 
values and deriving algorithms for the games we consider. We prove this result in the one- 
dimensional case; Corollary [2] then extends the result to n-dimensions. 

Theorem 1. Consider the one- dimensional unconstrained game where the player selects 
xt £M and the adversary chooses gt £ Q = [—1, 1], and L is concave in each of its arguments 
and hounded below on Q^^ . Then, 

V^= E [-L(gi,...,5T)]. 

where the expectation is over each gt chosen independently and uniformly from { — 1, 1} (that 
is, the gt are Rademacher random variables). Further, the conditional value of the game is 

Vt{gi,. . . ,gt) = [-L{gi,...,gt,gt+i,...gT)]- (5) 

9t+iv,9T~{-l,l} 

Proof. We argue by backwards induction (from t = T to t = 1) on the conditional value of 
the game, with the induction hypothesis that 

Vt{gi,...,gt) = E [-L(gi,...,<7T)], (6) 

9t+lv,9T~{-l,l} 

and further that Vt is convex in each of its arguments and bounded above on . The 
induction hypothesis holds trivially for T = t, using the assumption that L is concave and 
bounded below for the second part. Now, suppose the induction hypothesis holds for t. We 
then have (by the definition of Vt) 

Vt-i{gi,. . .,gt-i) = inf sup gtxt + Vt{gi,.. .,gt-i,gt). 

^' gt 

Note Vt-i must be convex in each of it's arguments, using the induction hypothesis on Vt. 
Let M{g,x) = gx + Vt{gi, . . . ,gt-i,g). We would like to appeal to the minimax theorem to 
switch the inf and sup, but since M is convex in g (using the induction hypothesis) rather 
than concave, we cannot do so immediately. However, because we are choosing gt G [—1, 1], 
it follows from the convexity of M that the supremum is obtained at either —1 or +1. Thus, 
we can write 

T4_i(5i, . . . = inf sup M{gt,xt) 

9t6[-l,l] 

= inf sup M{gt,xt) 
9te{-i,i} 

= inf sup E [M{gt,xt)], 

^* pteA({-i,i}) 
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where pt € [0, 1] is the probabihty the adversary chooses gt = +1 (otherwise, gt = — !)• Now 
Kg^r^p^[M{gt,xt)] i s hnear in both pt and xt, and s o we can apply the minimax theorem (e.g., 
Theorem 7.1 from Cesa-Bianchi and Lugosi |2qq3i), which gives 

Vt-i{gi,...,gt~i) = sup inf E [gtxt + Vt{gi, . . . , gt-i, g)] 
PteA({-i,i}) ^* 

= sup inf E [gtxt\+ E [Vt{gi, . . . , gt^i, gt)]. 

p*eA({-l,l}) ^* 9t~Pt 9t~Pt 

Now, the adversary (sup player) must choose pt = 0.5 so ^[gt] = 0, or otherwise the player can 
choose Xf to drive the value to — oo (since Vt is bounded above). Thus, the first expectation 
term disappears, and the choice of the player becomes irrelevant, giving 

Vt-i{gi, . . .,gt-i) = E[Vt{gi,.. .,gt-i,gt)], 

where now the expectation is on gt drawn i.i.d. from { — 1,1}. Applying the induction 
hypothesis completes the proof, since then iterated expectation yields Eq. ^ for Vt_i, and 
boundedness is immediate. □ 

The use of randon iization to allow the ap plication of the minimax theorem is similar to 
the technique used by Abernethv et al. 2009l | . 

A key insight from the proof is that an optimal adversary can always select from {—1,1}. 
With this knowledge, we can view the game as a binary tree of height T. An algorithm for the 
player simply assigns a play x G M to each node, and the adversary chooses which outgoing 
edge to take: if the adversary chooses the left edge, the player suffers loss x, otherwise the 
player wins x (suffers loss -x). Finally, when leaf i is reached, the adversary pays the player 
some amount L{i). Theorem [T] implies the value of the game is then simply the average value 
of -L{£). 

Given Theorem [H and the fact that the functions L of interest will generally depend 
only on gi;T, it will be useful to define Bt to be the distribution of gi-T when each gt is 
drawn independently and uniformly from {—1, 1} (that is, the sum of T Rademacher random 
variables) . 

Theorem [1] immediately yields bounds for games in n-dimensions where the adversary is 
constrained to play ||(/(||oo 

< 1: 

Corollary 2. Consider the game where the player chooses xt £ M", and the adversary chooses 
gt G [—1, 1]", and the total payoff is 

T n 

^9fXt - ^L(fi(i:T,i) 

t=l i=l 

for a concave function L. Then, the value of the game is 
Further, the conditional value of the game is 

n 

Vt{gi,...,gt) = Y.^ E [-L(5i:M + G'.)]. 

1=1 
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Proof sketch. The proof follows by noting the constraints on both players' strategies and the 
value of the game fully decompose on a per-coordinate basis. □ 



A recipe for minimax optimal algorithms in one dimension For any function L, 

T 

1=0 ^ ^ 

since 2~^("^) is the binomial probability of getting exactly i gradients of +1 over T rounds, 
which implies T — i gradients of —1, so G = i — {T — i) = 2i — T . 

Since Eq. ([5]) gives the minimax value of the game if both players play optimally from 
round t + 1 forward, a minimax strategy for the learner on round t + 1 must be 

xt+i = argmin rnax g ■ x + Vt+i{gi, . . . , gt, g) 

xeR 56{-l,l} 

= ^{Vt+iigu...,gt,-l)-Vt+iigi,---,9u+l)). (8) 

The second line follows because the argmin is simply over the max of two intersecting linear 
functions, which we can compute in closed form as the point of intersection. Thus, if we can 
derive a closed form for Vt{gi, . . . , gt), we will have an efficient minimax-optimal algorithm. 
In the next section, we explore cases where this is possible. 

When L depends only on G = gi-.T, we may be able to run the minimax algorithm 
efficiently even if Vf does not have a convenient closed form: if r = T — t, the number of 
rounds remaining, is small, then we can compute Vj exactly by using the appropriate binomial 
probabilities (following Eq. ([5]) and Eq. ([7|)). On the other hand, if r is large, then applying 
the Gaussian approximation to the binomial distribution may be sufficient. 

4 Deriving Minimax Optimal Algorithms 

In this sections, we explore three applications of the tools from the previous section. We 
begin with a relatively simple but interesting example which illustrates the technique. 

4.1 Constant step-size gradient descent can be minimax optimal 

Suppose we use a "soft" feasible set for the benchmark, 

L{G) = mm Gx + ^x"^ = -^G^, (9) 
X 2 2a 

for a constant o" > 0. Does a no-regret algorithm against this comparison class exist? Unfor- 
tunately, the general answer is no, as shown in the next theorem: 

Theorem 3. The value of this game is = E^^gy 



2(7 



T_ 

2cr- 
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Proof. Starting from Eq. ([T]), 



^EjG'] = ^yj . ]{2^-Tf Eq. © 




and since ELo (T) = EIo (T)* = ^2^-^ ELo (T)*^ = + ^^)2^-^ 

= (T + T^) - + = T. 
The result then follows from linearity of expectation. □ 



This implies Reward > Regret = ^ (G^— T), a fact noted by IStreeter and McMahan 



[2011 Lemma 2]. 

Thus, for a fixed o", we cannot have no a regret algorithm with respect to this L. However, 
if T is known in advance, we could choose a = \fT in order to claim no-regret. But this is a 
bit arbitrary: if the player could pick cj, and cares purely about Regret, obviously he would 
like to play the game where cr — t- oo, as that makes the value of the game (Regret) as small 
as possible. However, this choice also drives Reward to zero. If the lower-bound on reward 
is what matters, then the player should choose based on how he expects to relate to T. 

To derive the minimax optimal algorithm, we can compute conditional values (using 
similar techniques to Theorem [3]), 



1 



1 



2cr 



and so following Eq. ([8]) the minimax-optimal algorithm must use 

= ^ (((51, - 1)2 + (T - t - 1)) - ((51. + 1)2 + (T - i - 1))) 

= -r\-^9^-t) = 9l:t 

Thus, a minimax-optimal algorithm is simply constant-learning-rate gradient descent with 
learning rate ^. Note that for a fixed cr, this is the optimal algorithm independent of T; this 
is atypical, as usually the minimax optimal algorithm depends on the horizon (as we will see 
in the next two cases). 

4.2 Optimal regret against hypercube adversaries 

Abernethv et al.] 2008l | gives a minimax optimal algorithm when the player's xt and the 



comparator x are constrained to an L2 ball, and the adversary must also select gt from an 
L2 ball, for n > 3 dimensionsH In contrast, we consider regret compared to the best x 



^Their results are actually more general than this, allowing the constraint on ||,gt||2 to vary on a per-round 
basis. Our work could also be extended in that manner. 
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constrained to the unit Lqo ball, but allow the player to select any xt G M"; our adversary is 
constrained to select gt from the unit Lqo ball (the generalization to arbitrary hyper-rectangles 
is straightforward). Perhaps surprisingly, the optimal strategy for the player always plays 
from the unit L^o ball as well, so our results immediately apply to the case of the constrained 
player. 

Since we consider Lqo constraints on both the comparator and adversary. Corollary [2] 
implies it is sufficient to study the one-dimensional case. We consider the standard notion of 
regret, taking L{G) = — \G\ following Eq. Our main result is the following: 

Theorem 4. Consider the game between an adversary who chooses loss functions gt € [—1,1], 
and a player who chooses xt G M. For a given sequence of plays, xi, gi, X2, g2, ■ ■ ■ ,XT,gT, the 
value to the adversary is Ylt=i 9t^t — |51:t|- Then, when T is even with T = 2M, the minimax 
value of this game is given by 



Vt 



2MT\ 



{T -M)\M\ 



< 




2T 

IT 



Further, as T ^ oo, Vt — 
Proof. Letting T = 2AI and working from Eq. ([7]), 



V' 



■ E [L{G)] 

G^Bt 



2 /T 

i=0 ^ 



Ml 



2M f2M 

WK M 



-T 



2MT\ 



{T- M)\MV 



(10) 



of the binomial distribution (see also Diaconis and ZabeU 



where we have applied a classic formu la of Ide Moivrd 1718 1 for t he mean absolute deviation 



19911]). Using a standard bound 
on the central binomial coefficient (based on Stirling's formula). 



2M\ _ 4^^ 

M ) ~ 7^ 

where h < cm < \ for all M > 1, we have 



MJ 



(11) 



V^ < 2M 




2T 

IT 



As implied by Eq. pi|) . this inequality quickly becomes tight as T 



oo. 



□ 



The minimax algorithm (for the constrained player, too!) In order to compute the 
minimax algorithm, we would like a closed form for 

V{Gt) = -^E^^ [L{Gt + Gn], 

where Gt = gi-t is the sum of the gradients so far, t = T — t \s the number of rounds to 
go, and and G^ = gt+i-.T is a random variable giving the sum of the remaining gradients. 
Unfortunately, the structure of the binomial coefficients exploited by de Moivre and used in 
Eq. (jlOp does not apply given an arbitrary offset G^ . Nevertheless, we will be able to derive 



10 



a formula for the update that is readily computable. Writing Pi\{b) for the probability a 
random draw from Br has value b, the update of Eq. ([8]) becomes 



1 

xt+i = 2 Z Prr{b)(\Gt + b-l\ - |Gt + 6+l|). 

b=-T 

Whenever Gt + b > 1, the difference in absolute values is —2, and whenever Gt + b < 1, the 
difference is 2. When + 6 = 0, the difference is zero. Thus, 

xt+i = \ [Virib > -G)(-2) + Pr,(6 < -G)(2)) 

= Pr^(6 < -G) - Piv(& > -G). (12) 

While this update does not have a closed form, it can be efficiently computed numer ically0 
It follows from this expression that even though we allow the player to select xt+i € M, the 
minimax optimal algorithm always selects points from [—1,1]. Thus, we have the following 
Corollary: 

Corollary 5. Consider the game of Theorem^ but suppose now we also constrain the player 
to choose xt G [—1, 1]. This does not change the value of the game, as the minimax algorithm 
for the unconstrained case always plays from [—1,1] regardless. 



Abernethv et al. |2008l ] shows that for the linear game with n > 3 where both the learner 



and adversary select vectors from the unit sphere, the minimax value is exactly \/T. In- 
terestingly, in the n = 1 case (where L2 and L^o coincide), the value of the game is lower, 
about rather than VT. This indicates a fundamental difference in the geometry of 

the n = 1 space and n > 3. We conjecture the minimax value for the L2 game with n = 2 
lies somewhere in between. 

4.3 Non-stochastic betting and No-regret for all feasible sets simultane- 
ously 

We derive a minimax optimal approach to the betting problem presented in Se ction [2l which 



also corresponds to the setting introduced bv IStreeter and McMahanI |2012l ]. Again, it is 
sufficient to consider the one-dimensional case. In that work, the goal was to prove bounds 
like Regret < O {R^/T log {{1 + R)T)) simultaneously for any comparator x with \x\ = R. 
Stating their Theorem 1 in terms of losses, this bound is achieved by any algorithm that 
guarantees 

Loss = ^ gtxt <- exp (^-^) +0{1). (13) 



t=i 



Note that whenever \gi:T\ is large compared to yT the algorithm must achieve significantly 
negative loss (positive reward). 



''The CDF of the binomial distribution can be computed numerically using the regularized incomplete beta 
function, from which Prr(6 < — G) can be derived. Then, PTr{b = —G) can be computed from the appropriate 
binomial coefficient, leading to both needed probabilities. 
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We initially study the game where 

L(G) = -exp(-^) (14) 

(note G = gi-T G [— T, T] can be positive or negative). We prove the minimax algorithm 
achieves Ylt=i9tXt - L{gi.,T) < ^/e, implying Reward = -Ylt=igtXt > exp (^^^ - ^Je. 
Thus, this algorithm guarantees large reward whenever the gradient sum G is large and 
positive. In order to satisfy Eq. ()13p . we must also achieve large reward when G is large 
and negative. Since L{gi-_t) + L(—gi-t) < — exp ^^^^^ , this can be accomplished by running 
two copies of the minimax algorithm simultaneously, switching the signs of the gradients and 
plays of the second copy. We formalize this in Appendix [XI 



Interpretation as a soft feasible set Before developing an algorithm it is worth noting 
an alternative characterization of this benchmark function. One can show, that for a > 0, 

G 



min {Gx — ax log(— ax) + ax) = — exp 

xeR- \ a 

Thus, if we take ^(x) = — axlog(ax) + ax + I{x < 0), we have 

mm gi TX + w(x) = — exp I — 

xeK- ■ \ a 

Since this algorithm needs large Reward when G is large and positive, we might expect that 
the minimax optimal algorithm only plays Xt < 0. Another intuition for this is that the 
algorithm should not need to play any point x to which 'I' assigns an infinite penalty. This 
intuition is confirmed by the analysis of this "one-sided" algorithm: 

Theorem 6. Consider the game with benchmark L as defined in Eq. ()14p . The minimax 
value of this game is exactly 

2 



and further limr_>.oo = \fe. Letting r = T — t he the number of rounds left to be played, 
and defining Gt = gi-.t, the conditional value of the game is 

yt(Gt) = 2--exp(^^) (l + exp(2/^/r))", 

which leads to the minimax optimal algorithr^ for the player 

= -2-- exp ~lr^ ^ (exp (-^ - l\ (exp (-^ +l\ < 0. (15) 



^When computing the player's strategy via Eq. psp . it is numerically preferable to do the calculation in 
log-space, and then exponentiate to get the final play. 
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Proof. First, we compute the value of the game: 

T 



= exp {-Vf)J2(^){ (2/ 



i=0 

= 2^^ exp ( - Vr) (l + exp 

where we have used the ordinary generating function, Yli=o (T)'^* — ~^ x)^ . Manipulating 
the above expression for the value of the game, we arrive at = cosh(l/\/T)"^. Using 
the series expansion for cosh leads to the upper bound cosh(2;) < exp(x^/2), from which we 
conclude 

Vt = (cosh (1/Vt)) < exp i — \ = ^e. 

Using similar techniques, we can derive the conditional value of the game, letting t = T—t 
be the number of rounds left to be played: 

y,(G,) = 2-^t (;)-p(^^^|^) =2-^-p(^) (l + exp(2/VT))^ 



Following Eq. ([8]) and simplifying leads to the update of Eq. ()15p . It remains to show 

lim7-_5.oo Vt = \/e. Using the change 
Examining the log of this function. 



1 

lim7-_5.oo Vt = V^- Using the change of variable x = 1/ v T, equivalently we have lima;_>o cosh(2;) ^ . 



lim log I coshfx) I = lim log coshfx) = lim —^ri 1 h • • • I = — , 

x^o ^\ ^ ' ) x^Qx^ ^ ^ ' x^fix^\ 2 12 45 2520 ) 2' 

where we have taken the Maclaurin series of log cosh(a;). Using the continuity of exp, we have 
lim (cosh(2;)^^ = exp (lini log (cosh(2;)^^^ = ^Je. 

□ 



A strong lower-bound Recall from Section [2] that as long as —L{G) > and we get 
constant regret with respect to L, we can scale our bets so that we never risk losing more 
than a constant starting budget. This holds for any number of rounds T against any adversary. 
Given that constraint, we would like —L{G) to grow as fast as possible, so it is natural to 
consider the generalizing Eq. ()14p as 

L,(G) = -exp(^) 

for a £ (0, Following the techniques used in the preceding proof, we can show for this 
game 

Vj = E[L{G)] = 2-^ exp (-r^"") (l +exp (2r^"))^ = cosh (r"")^. 
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By taking the first term in the series for log cosh namely 2;^/2, and plugging in x = 1/T" < 
1, we get a good upper bound on the value of the game: 

= exp (Tlogcosh(r-")) < exp (t^^ = exp Q^^-^"^ 

This implies that, for any a < 1 /2, no algorithm can provide constant loss (that is, Ylt=i 9t^t — 
k for a constant k > 0) for any sequence while also guaranteeing 



Reward = ~Y1 ^tXt = ^ (^exp (^^^ ^ 



(16) 



for any a < 1/2. In fact, for a < 1/2, no algorithm can guarantee even linear loss in the 
worst case while making the reward guarantee of Eq. (fT6]l . 
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A A Symmetric Betting Algorithm 

The one-sided algorithm of Theorem [6] has 

Loss = + L{G) < - exp + Ve- 

in order to do well when gi:T is large and negative, we can run a copy of the algorithm 
on —gi, . . . , — ST) switching the signs of each xt it suggests. The combined algorithm then 
satisfies 

f G \ f-G\ 
Loss < — exp — exp + 2^/e 



<-exp(||)+2V^, 

and so following Eq. p3|) and Theorem 1 of Streeter and McMahan 2012I ]. we obtain the 
desired regret bounds. The following theorem implies the symmetric algorithm is in fact 
minimax optimal with respect to the combined benchmark 

MG) = -exp(^)-exp(^ 



Theorem 7. Consider two 1-D games where the adversary plays from [—1, 1], defined by 
concave functions Li and L2 respectively. Let xl and x1 he minimax- optimal plays for Li 
and L2 respectively, given that gi,...gt-i have been played so far in both games. Then 
Xl + X2 is also minimax optimal for the combined game that uses the benchmark Lc{G) = 
Li{G) + L2{G). 

Proof. First, taking r = T — t and using Theorem [1] three times, we have 
V^igi, ...,gt) = - E [Li(5i:t + G^ + L^igi-.t + G^)] 

= V\gi,...,gt) + V\gi,...,gt), 
using linearity of expectation. Then, using Eq. ([8]) for each of the three games, we have 



argmin max gx + Vc{9i,- ■ ■ , gt-i,g) 

X 9 

^{Vcigi, ■ ■ -,91-1,-1) - Vcigi,.. .,gt-i,+l)) 

^(^1(51, ■ ■ -,91-1,-1) + V2{gi,.. .,gt-i,-l) - Vi{gi, . . . ,gt-i,i ) - 1^2(51, • • -,91-1, +1)) 

□ 
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